JAVA as a Basis for Parallel Data Mining in Workstation Clusters

نویسندگان

  • Matthias Gimbel
  • Michael Philippsen
  • Bernhard Haumacher
  • Peter C. Lockemann
  • Walter F. Tichy
چکیده

The exploitation of hidden information from large datasets by means of data mining techniques su ers from long response times. We address this problem by using the processing power of workstation clusters and have studied the performance of OLAP queries as a rst step towards a portable data mining platform. The results of our study suggest that with the availability of parallel workstation clusters that are equipped with high performance communication networks, ne-grained and communication-intensive parallelizations of queries are promising { even though they are considered too costly in traditional database systems. The paper describes our Java framework for parallel OLAP-type query execution, necessary optimizations to the standard Java implementation, and analyzes the performance of non-standard parallel execution schemes on a workstation cluster.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Exploiting idle cycles to execute data mining applications on clusters of PCs

In this paper we present and evaluate Inhambu, a distributed object-oriented system that supports the execution of data mining applications on clusters of PCs and workstations. This system provides a resource management layer, built on the top of Java/RMI, that supports the execution of the data mining tool called Weka. We evaluate the performance of Inhambu by means of several experiments in h...

متن کامل

Data mining on PC cluster connected with storage area network: its preliminary experimental results

Personal computer/Workstation (PC/WS) clusters have become a hot research topic recently in the field of parallel and distributed computing. They are considered to play an important role as a large scale computer system, such as large server sites and/or high performance parallel computers, because of their good scalability and cost performance ratio. In the viewpoint of applications, data inte...

متن کامل

Implementation and Evaluation of Parallel Data Mining on PC Cluster and Optimization of its Execution Environments

Personal Computer/Workstation clusters have been studied intensively in the field of parallel and distributed computing. In the viewpoint of applications, data intensive applications such as data mining and ad-hoc query processing in databases are considered very important for high performance computing, as well as conventional scientific calculations. We have built and evaluated PC cluster pil...

متن کامل

Preliminary Experimental Results of a Parallel Association Rule Mining on ATM Connected PC Clusters

Until recently, workstations were overwhelmingly superior to personal computers in terms of performance. However, recent PC technology has dramatically increased its CPU, main memory, and cache memory performance. Therefore massively parallel computer systems are moving away from proprietary components such as CPU, disks, etc. to commodity parts. As far as applications are concerned, we believe...

متن کامل

Using Available Remote Memory Dynamically for Parallel Data Mining Application on ATM-Connected PC Cluster

Personal computer/Workstation (PC/WS) clusters are promising candidates for future high performance computers, because of their good scalability and cost performance ratio. Data intensive applications, such as data mining and ad hoc query processing in databases, are considered very important for massively parallel processors, as well as conventional scientific calculations. Thus, investigating...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1999